Improving Minority Class Prediction Using Case-Speci c Feature Weights

نویسندگان

  • Claire Cardie
  • Nicholas Howe
چکیده

This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speci c feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using pathspeci c information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signi cantly increase the accuracy of minority class predictions while maintaining or improving overall classi cation accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Minority Class Prediction Using Case-Specific Feature Weights

This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an information-gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibi...

متن کامل

A Minimum Risk Metric for Nearest Neighbor Classification

nale. Retrieval in a prototype-based case library: A case study in diabetes therapy revision. CH97] C. Cardie and N. Howe. Improving minority class prediction using case-speciic feature weight. CS93] Scott Cost and Steven Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. DP97] Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian clas-si...

متن کامل

SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is spe...

متن کامل

Improving the Quality of Minority Class Identification in Dialog Act Tagging

We present a method of improving the performance of dialog act tagging in identifying minority classes by using per-class feature optimization and a method of choosing the class based not on confidence, but on a cascade of classifiers. We show that it gives a minority class F-measure error reduction of 22.8%, while also reducing the error for other classes and the overall error by about 10%.

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997